CLIP Explained | Multi-modal ML